Goto

Collaborating Authors

 series forecasting


Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting

arXiv.org Machine Learning

Unlike MLPs, Kolmogorov-Arnold Networks (KANs) expose explicit learnable edge functions on every connection, enabling mechanistic explanation in time-series forecasting. This paper introduces Temporal Functional Circuits, a framework that transforms KAN edge functions from latent visualizations into faithful, temporally grounded explanations. Built on a gated residual KAN that decomposes forecasts into a linear base and a sparsely activated KAN correction, the framework (i) maps each edge to input lags via output-aware attribution, (ii) ranks edges by learned activation range, and (iii) validates faithfulness through edge-level interventions including zeroing and spline removal. Removing the learned B-spline component while retaining the base SiLU term degrades forecasts, providing evidence that the spline shape itself carries predictive value beyond the base activation. On four synthetic regimes of increasing complexity, the learned gate opens progressively wider as signal complexity grows. On regime-switching signals, gated KAN achieves 59% lower MSE than linear-only models. Across eight benchmarks, the gated architecture is competitive with linear, attention, and MLP alternatives, while providing interpretable edge functions that MLP-based corrections cannot offer.


Why Model Selection Fails in Time Series Forecasting: An Empirical Study of Instability Across Data Regimes

arXiv.org Machine Learning

Time series forecasting models often exhibit inconsistent performance across datasets with varying statistical and structural properties. Despite the wide range of available forecasting techniques, it remains unclear whether model selection can be reliably guided by simple data characteristics. This paper investigates why rule-based model selection fails in time series forecasting by analyzing the relationship between data-regime descriptors and model performance. A descriptor-based framework is introduced to characterize time series using measurable properties, including trend strength, seasonality, noise level, and temporal dependence. Based on these descriptors, a rule-based selection mechanism is formulated to map data regimes to candidate forecasting models. The approach is evaluated on multiple real-world datasets across different domains and forecasting horizons. The results show that rule-based model selection achieves low accuracy, with correct model identification occurring in only a small fraction of cases. Significant discrepancies are observed between recommended and empirically optimal models, particularly in noisy and mixed regimes. Further analysis reveals that model performance is highly sensitive to both dataset characteristics and forecasting horizon, resulting in substantial ranking instability across scenarios. These findings explain why simple heuristic rules fail to generalize and demonstrate that forecasting performance cannot be reliably predicted using static, descriptor-based approaches. This study provides empirical evidence that model selection in time series forecasting is inherently context-dependent and highlights the need for more adaptive, data-driven strategies.


DecompKAN: Decomposed Patch-KAN for Long-Term Time Series Forecasting

arXiv.org Machine Learning

Accurate time series forecasting in scientific domains such as climate modeling, physiological monitoring, and energy systems benefits from both competitive predictions and model transparency: practitioners value understanding how a model transforms temporal features, not merely what it predicts. Transformer-based models achieve strong accuracy but their attention weights reveal only token-level relevance, not the functional transformations applied to each feature. This work proposes DECOMPKAN, a lightweight attention-free architecture that combines trend-residual decomposition, channel-wise patching, learned instance normalization, and B-spline Kolmogorov-Arnold Network (KAN) edge functions. Each KAN edge learns an explicit, inspectable 1D scalar function ϕ(x) over learned patch-embedding coordinates that can be directly visualized, offering a form of architectural transparency not directly available in attention-based or MLP-based architectures. On standard benchmarks, DECOMPKAN achieves best or tied-best MSE on 15 of 32 dataset-horizon combinations among selected published baselines, and achieves best or tied-best MSE on 20 of 36 comparisons (25 of 36 MAE; ties counted for all tied models) under a controlled same-recipe evaluation across 9 datasets including the physiological PPG-DaLiA benchmark. The architecture shows particular strength on datasets with smooth temporal dynamics (Solar 17%, ECL 10%vs.


Predict, Refine, Synthesize: Self-Guiding Diffusion Models for Probabilistic Time Series Forecasting

Neural Information Processing Systems

Diffusion models have achieved state-of-the-art performance in generative modeling tasks across various domains. Prior works on time series diffusion models have primarily focused on developing conditional models tailored to specific forecasting or imputation tasks. In this work, we explore the potential of taskagnostic, unconditional diffusion models for several time series applications. We propose TSDiff, an unconditionally-trained diffusion model for time series. Our proposed self-guidance mechanism enables conditioning TSDiff for downstream tasks during inference, without requiring auxiliary networks or altering the training procedure. We demonstrate the effectiveness of our method on three different time series tasks: forecasting, refinement, and synthetic data generation. First, we show that TSDiff is competitive with several task-specific conditional forecasting methods (predict). Second, we leverage the learned implicit probability density of TSDiff to iteratively refine the predictions of base forecasters with reduced computational overhead over reverse diffusion (refine). Notably, the generative performance of the model remains intact -- downstream forecasters trained on synthetic samples from TSDiff outperform forecasters that are trained on samples from other state-of-the-art generative time series models, occasionally even outperforming models trained on real data (synthesize).




What if We Enrich day ahead Solar Time Series Forecasting with Temporal Context Supplementary material

Neural Information Processing Systems

For both15 modalities, essential information such as geographic coordinates, elevation, and precise time-stamps16 is available. In this section, we provide a comprehensive explanation of the encoding process for each17 feature and conclude by presenting the hyperparameters of the model.18 For each time point, we have access to the following time19 features: The year, the month, the day, the hour and the minute at which the measurement was made.20 We use a cyclical embedding to encode these time features discarding the year. For a time feature x,21 its corresponding embedding can be expressed as:22 sin 2πx ω(x),cos 2πx ω(x) (1) Submitted to 37th Conference on Neural Information Processing Systems (NeurIPS 2023).


Are Self-Attentions Effective for Time Series Forecasting?

Neural Information Processing Systems

Time series forecasting is crucial for applications across multiple domains and various scenarios. Although Transformers have dramatically advanced the landscape of forecasting, their effectiveness remains debated. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches, highlighting the potential for more streamlined architectures. In this paper, we shift the focus from evaluating the overall Transformer architecture to specifically examining the effectiveness of self-attention for time series forecasting. To this end, we introduce a new architecture, Cross-Attention-only Time Series transformer (CATS), that rethinks the traditional transformer framework by eliminating self-attention and leveraging cross-attention mechanisms instead. By establishing future horizon-dependent parameters as queries and enhanced parameter sharing, our model not only improves long-term forecasting accuracy but also reduces the number of parameters and memory usage. Extensive experiment across various datasets demonstrates that our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.The implementation of our model is available at: https://github.com/dongbeank/CATS.



Scaling Law for Time Series Forecasting

Neural Information Processing Systems

Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizon may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works.